Search CORE

1,448 research outputs found

MPCI : An R Package for Computing Multivariate Process Capability Indices

Author: Edgar Santos-Fernandez
Michele Scagliarini
Publication venue: Foundation for Open Access Statistics
Publication date: 01/01/2012
Field of study

Manufacturing processes are often based on more than one quality characteristic. When these variables are correlated the process capability analysis should be performed using multivariate statistical methodologies. Although there is a growing interest in methods for evaluating the capability of multivariate processes, little attention has been given to developing user friendly software for supporting multivariate capability analysis. In this work we introduce the package MPCI for R, which allows to compute multivariateprocess capability indices. MPCI aims to provide a useful tool for dealing with multivariate capability assessment problems. We illustrate the use of MPCI package through both simulated and real examples

Directory of Open Access Journals

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Journal of Statistical Software

Graph Neural Network-Based Anomaly Detection for River Network Systems

Author: Buchhorn Katie
Mengersen Kerrie
Salomone Robert
Santos-Fernandez Edgar
Publication venue
Publication date: 31/05/2023
Field of study

Water is the lifeblood of river networks, and its quality plays a crucial role in sustaining both aquatic ecosystems and human societies. Real-time monitoring of water quality is increasingly reliant on in-situ sensor technology. Anomaly detection is crucial for identifying erroneous patterns in sensor data, but can be a challenging task due to the complexity and variability of the data, even under normal conditions. This paper presents a solution to the challenging task of anomaly detection for river network sensor data, which is essential for accurate and continuous monitoring. We use a graph neural network model, the recently proposed Graph Deviation Network (GDN), which employs graph attention-based forecasting to capture the complex spatio-temporal relationships between sensors. We propose an alternate anomaly scoring method, GDN+, based on the learned graph. To evaluate the model's efficacy, we introduce new benchmarking simulation experiments with highly-sophisticated dependency structures and subsequence anomalies of various types. We further examine the strengths and weaknesses of this baseline approach, GDN, in comparison to other benchmarking methods on complex real-world river network data. Findings suggest that GDN+ outperforms the baseline approach in high-dimensional data, while also providing improved interpretability. We also introduce software called gnnad

arXiv.org e-Print Archive

Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective

Author: Mengersen Kerrie
Peterson Erin E.
Rushworth Em
Santos-Fernandez Edgar
Vercelloni Julie
Publication venue
Publication date: 01/06/2020
Field of study

Many research domains use data elicited from "citizen scientists" when a direct measure of a process is expensive or infeasible. However, participants may report incorrect estimates or classifications due to their lack of skill. We demonstrate how Bayesian hierarchical models can be used to learn about latent variables of interest, while accounting for the participants' abilities. The model is described in the context of an ecological application that involves crowdsourced classifications of georeferenced coral-reef images from the Great Barrier Reef, Australia. The latent variable of interest is the proportion of coral cover, which is a common indicator of coral reef health. The participants' abilities are expressed in terms of sensitivity and specificity of a correctly classified set of points on the images. The model also incorporates a spatial component, which allows prediction of the latent variable in locations that have not been surveyed. We show that the model outperforms traditional weighted-regression approaches used to account for uncertainty in citizen science data. Our approach produces more accurate regression coefficients and provides a better characterization of the latent process of interest. This new method is implemented in the probabilistic programming language Stan and can be applied to a wide number of problems that rely on uncertain citizen science data.Comment: 18 figures, 5 table

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Bayesian spatio-temporal models for stream networks

Author: Hoef Jay M. Ver
Isaak Daniel
McGree James
Mengersen Kerrie
Peterson Erin E.
Santos-Fernandez Edgar
Publication venue
Publication date: 05/03/2021
Field of study

Spatio-temporal models are widely used in many research areas including ecology. The recent proliferation of the use of in-situ sensors in streams and rivers supports space-time water quality modelling and monitoring in near real-time. In this paper, we introduce a new family of dynamic spatio-temporal models, in which spatial dependence is established based on stream distance and temporal autocorrelation is incorporated using vector autoregression approaches. We propose several variations of these novel models using a Bayesian framework. Our results show that our proposed models perform well using spatio-temporal data collected from real stream networks, particularly in terms of out-of-sample RMSPE. This is illustrated considering a case study of water temperature data in the northwestern United States.Comment: 26 pages, 10 fig

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Increasing trust in new data sources: crowdsourcing image classification for ecology

Author: Christensen Bryce
Heron Grace
Mengersen Kerrie
Peterson Erin E.
Price Aiden
Santos-Fernandez Edgar
Vercelloni Julie
Publication venue
Publication date: 01/05/2023
Field of study

Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.Comment: 25 pages, 10 figure

arXiv.org e-Print Archive

clusterBMA: Bayesian model averaging for clustering

Author: Forbes Owen
Hermens Daniel F.
Lagopoulos Jim
Mengersen Kerrie
Mills Lia
Sacks Dashiell D.
Santos-Fernandez Edgar
Schwenn Paul E.
Wu Paul Pao-Yen
Xie Hong-Bo
Publication venue
Publication date: 01/01/2023
Field of study

Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combined cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model. From a consensus matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters, combining allocation probabilities from 'hard' and 'soft' clustering algorithms, and measuring model-based uncertainty in averaged cluster allocation. This method is implemented in an accompanying R package of the same name

arXiv.org e-Print Archive

Directory of Open Access Journals

Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences

Author: Addou
Altschul
Altschul
Andreeva
Andrei L. Turinsky
Andrew R. Gehrke
Badis
Benos
Berger
Berger
Berman
Brohee
Brown
Brown
Brown
Bulyk
Bulyk
Carbon
Donald
Dudley
Edgar
Enright
Enright
Finn
Furukubo-Tokunaga
Garcia-Fernandez
Gascuel
Gwenael Badis
Harbison
Hayashi
Hoffmann
Holland
Hunter
Itzkovitz
Jennifer Tsai
Katoh
Kawaji
Krishnamurthy
Lee
Lees
Li
Li
Luscombe
Mackay
Man
Martha L. Bulyk
Matys
Meila
Michael F. Berger
Miguel A. Santos
Mukherjee
Ochagavia
Peregrin-Alvarez
Ravasi
Remm
Scott
Serene Ong
Shaheynoor Talukder
Shoshana J. Wodak
Sjolander
Storm
Takatori
Thompson
Timothy R. Hughes
Tsai
van Dongen
Vlasblom
Vlieghe
Weston
Wicker
Wilson
Zhong
Zmasek
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/07/2010
Field of study

Classifying proteins into subgroups with similar molecular function on the basis of sequence is an important step in deriving reliable functional annotations computationally. So far, however, available classification procedures have been evaluated against protein subgroups that are defined by experts using mainly qualitative descriptions of molecular function. Recently, in vitro DNA-binding preferences to all possible 8-nt DNA sequences have been measured for 178 mouse homeodomains using protein-binding microarrays, offering the unprecedented opportunity of evaluating the classification methods against quantitative measures of molecular function. To this end, we automatically derive homeodomain subtypes from the DNA-binding data and independently group the same domains using sequence information alone. We test five sequence-based methods, which use different sequence-similarity measures and algorithms to group sequences. Results show that methods that optimize the classification robustness reflect well the detailed functional specificity revealed by the experimental data. In some of these classifications, 73–83% of the subfamilies exactly correspond to, or are completely contained in, the function-based subtypes. Our findings demonstrate that certain sequence-based classifications are capable of yielding very specific molecular function annotations. The availability of quantitative descriptions of molecular function, such as DNA-binding data, will be a key factor in exploiting this potential in the future.Canadian Institutes of Health Research (MOP#82940)Sickkids FoundationOntario Research FundNational Science Foundation (U.S.)National Human Genome Research Institute (U.S.) (R01 HG003985

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

The multifaceted roles of perlecan in fibrosis

Author: Al-Ahmad
Andersson-Sjoland
Andersson-Sjoland
Arthur
Ashikari-Hada
Asplund
Aviezer
Barkefors
Barros
Behrens
Beller
Bengtsson
Bingley
Bishop
Bittencourt
Bix
Bix
Borkham-Kamphorst
Bremnes
Burgess
Cailhier
Cassiman
Celie
Ceol
Chakravarti
Chen
Chilosi
Chuang
Chuang
Cohen
Conde-Knape
Conte
Costell
Di Sario
Dos Santos
Duffield
Ebefors
Edgar
Esko
Evanko
Farach-Carson
Farach-Carson
Fengying Tang
Fernandez
Forsberg
Friedman
Friedman
Fujii
Gallai
Garl
Ge
Gellersen
Giros
Gohring
Gonzalez
Gonzalez
Goyal
Gressner
Gubbiotti
Guo
Hajari Case
Hallgren
Hammond
Hashimoto
Haubst
Hayashi
Hayward
Hedin
Hedin
Hering
Hilberg
Hisatomi
Ichimaru
Ikeguchi
Iozzo
Iozzo
Iozzo
Iredale
James G.W. Smith
James Melrose
Jelena Rnjak-Kovacina
John M. Whitelock
Jung
Kahari
Kaji
Kennett
Kinsella
Knox
Knox
Knox
Knox
Kotaru
Kovalszky
Koyama
Krimmer
Kunjathoor
Kvist
Laplante
Laplante
Leask
Leask
Leask
Lee
Lee
Lee
Lekkerkerker
Levy
Liang
Lord
Lord
Lord
Lord
Lundkvist
Mak
Mak
Mak
Malmstrom
Marra
Martinez-Hernandez
Martinez-Hernandez
Marx
Marzoll
Masola
Matsui
Matsuo
Megan S. Lord
Melrose
Melrose
Menzel
Mitra
Mollmark
Mongiat
Mongiat
Moon
Murphy
Musso
Nagy
Nakamura
Nakayama
Narita
Nguyen
Nihlberg
Nilsson
Ogawa
Ohkawara
Ozaki
Paka
Patel
Pillarisetti
Pinzani
Poluri
Pozzi
Raines
Raj
Rauch
Raymond
Rees
Rienstra
Rnjak-Kovacina
Rojkind
Rolls
Rosenzweig
Roskams
Ross
Sahin
Sakai
Salem
Sato
Schaefer
Schafer
Segev
Sharma
Shimizu-Hirota
Singh
Sofroniew
Sweeney
Tang
Tang
Tannock
Tao
Teh
Teimouri
Theret
Thomas
Thyberg
Timpl
Tomasek
Tran
Tran
Tran-Lundmark
Tátrai
Uchimura
Vanheule
Venkatesan
Venkatesan
Wang
Weiser
Weiser
Westergren-Thorsson
Westergren-Thorsson
Westergren-Thorsson
Whitelock
Whitelock
Whitelock
Whitelock
Willis
Wollin
Wollin
Woodall
Xu
Yamamoto
Yang
Yaoi
Yue
Yung
Yung
Zeisberg
Zeisberg
Publication venue: 'Elsevier BV'
Publication date: 01/08/2018
Field of study

Perlecan, or heparan sulfate proteoglycan 2 (HSPG2), is a ubiquitous heparan sulfate proteoglycan that has major roles in tissue and organ development and wound healing by orchestrating the binding and signaling of mitogens and morphogens to cells in a temporal and dynamic fashion. In this review, its roles in fibrosis are reviewed by drawing upon evidence from tissue and organ systems that undergo fibrosis as a result of an uncontrolled response to either inflammation or traumatic cellular injury leading to an over production of a collagen-rich extracellular matrix. This review focuses on examples of fibrosis that occurs in lung, liver, kidney, skin, kidney, neural tissues and blood vessels and its link to the expression of perlecan in that particular organ system

Crossref

University of East Anglia digital repository